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1. INTRODUCTION 

The rate of deadly accidents on highway caused by drowsiness and falling asleep while driving 
based on the latest statistics of the ministry of equipment, transport, logistics and water, directorate of roads 
in Morocco is 33.3% as provided in [1], [2]. These statistics gave us the idea of developing an automatic 
model that can predict drowsiness when occurring and before the situation becomes worst leading to 
dangerous accidents. Therefore, the idea of our system is not new, but it came to improve the performance 
and solve the limitations of the existing ones by using the latest processing software ‘Python’, also by 
providing the best processing techniques ‘time and frequency’ and machine learning (ML) algorithms to 
perform a better hybrid and automatic method of detecting drowsiness based on single-channel of 
electroencephalogram (EEG) signals [3]. As a result, our model based on an optimized decision tree (DT) 
classifier shows a higher performance compared to our previous one and to all the previous works, improving 
the accuracy and the time consuming. Our previous study (conference paper publishing in progress) was to 
conceive an efficient model based on a heavy analysis, during that period a detailed study was carried on the 
existing systems and their limitations. 

Therefore, the existing works like cited in our previous work were based on sensors only, based on 
physiological signals like EEG, electrocardiogram (ECG), and electro-oculogram (EOG) [4]-[8], or even a 
mix of these two techniques [9]. Chang et al. [10] proposed a smart glasses system that detects drowsiness 
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using signals generated by accelerometers and gyroscopes, capturing the head ’s micro-falls in addition to an 
infra-red transceiver for capturing the blinking frequency and the eyes-closure degree. Other works used 
algorithms that can detect drowsiness using a facial recognition or eyes regions detection by [11], [12], or 
also using a thermal imaging tehniques proposed by [13]. However, We did mention on our last work that 
using signals issued only from sensors and not physiological signals is not accurate nor evident to confirm 
the detection’s efficiency, because a driver’s blinking or eyes closure or even his head’s movement are a 
standard and spontaneous actions. So, the solution was to use a method and technique based on signals 
recorded from EEG, ECG, EOG and others, EEG signals in our work [14]-[18]. To situate our work, the 
following works used a single channel study in addition to using the same dataset of EEG signals available at 
the Physionet database to compare our results and show the improvement added by our hybrid method. 
Belakhdar et al. [19] proposed a technique that analyses the spectral domain of the EEG signals using 
MATLAB, applying the Fourier transform and an artificial neural network (ANN) classification. Their work 
reached an accuracy of 88.8%. Bajaj et al. [20] reached an accuracy of 91% using tunable Q-factor wavelet 
transform (TQWT) algorithm appliyed on the EEG signals, and the extreme machine learning classifier 
(ELM). The highest accuracy of 94.45% is reached by [21] using the wavelet packet transform (WPT) 
method and fed to the extra-trees classifier. 

The proposed work aims to improve our previous algorithm’s efficiency of detecting drowsiness of 
drivers in the terms of rapidity and accuracy, using a personalized and optimized DT classifier that we will 
explain next. Our method proposed in this paper aims to provide an optimized and new hybrid algorithm 
drivers’ drowsiness detection based on the mixed temporal and frequential domains by processing a single 
channel of EEG records (FP1). Many researchers have confirmed that the most accurate position for 
detecting drowsiness is the FP1 position like published by [22]. Our proposed method is shown in Figure 1. 


EEG Signals Acquisition 


Temporal : PSD Domain 
a | 
Features Extraction 


Features Selection 


& Vectorization 


Classification Models 
New Data Predictive 
Model 


Confusion | 
Matrix 


Figure 1. Flowchart of our proposed method 
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2. METHOD 
2.1. Pre-processing Phase 

The open Physionet database is the one we used in our works because it’s the best to use for similar 
works. All the EEG records were artefact-free and noise filtered right after the acquisition step using a 30 Hz 
low-pass filter and a 50 Hz notch. The signals were extracted from the subjects under the 10-20 international 
system, they were males and females with different ages [23], [24]. 


2.2. Time segmentation phase 

We applied a segmentation of 3 seconds of EEG signals instead of using the whole 30 seconds 
recording. The benefice of this time segmentation is to ensure stationarity of spectral analysis (fast Fourier 
transform (FFT) and power spectral density (PSD) analysis). In other terms also, to ensure the real time 
condition so that the process of detecting the drowsiness state do not take a higher time consumption. 


2.3. Features Extraction Phase 

This step aims to extract the most significant features using a single-channel of EEG from three 
mixed domains (temporal, Fourier and spectral). We designed a function that can extract all the features one 
by one, and scales all of them in the right shape for the classification step. The use of the mixing features was 
not chosen randomly but after an analyze where we found that this mixture shows the highest accuracies and 
results. 


2.3.1. Temporal domain analysis 

Eight parameters are calculated in the time domain in a way to distinguish the awake from the 
drowsy state. Using conditions to process intervals of 3 seconds, we calculated all the features manually 
according to the best ones resulting the best accuracies of the models. These features were the minima, the 
maxima, the amplitude peaks and our proper mean of amplitude peaks parameter, in addition to the following 
ones: 


The median: 

Py < x) = Pk < z) (1) 
The mean: 

X=" (2) 
The variance: 

_ Y@i-x)? 

Var = a (3) 
The standard deviation: 

Std = VVar (4) 


The root-mean-square: 


RMS = pee (5) 


2.3.2. Fourier and power spectral domain analysis 

In this phase, we proposed a frequency analysis of the recorded EEG signals using the fast Fourier 
transform. After extracting the same previous feature, the modulus of these features is calculated to eliminate 
the imaginary part and have only the real significant part. 


For: O0O<k<N-1 


n-1 kn 
Xk = > xne 2" Nn (6) 


n=0 


A comparison of the brain band’s power is calculated using the burg algorithm (spectrum analysis) 
to allow a good discrimination between the awake and drowsy states. 
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PSD = 2ST y(nyeH -1xK (7) 
~ oN ~ N 


k=0 


2.4. Features selection & classification 

A total of eight ML classification methods is tested in our study to compare the efficiency and keep 
the best model, and secondly, to select the most appropriate features. As a result, our optimized model 
showed the best of accuracies and time performance. The classifiers we used to compare our model’s 
efficiency are gaussian process (GP), K-nearest-neighbors (KNN), multilayer perceptron (MLP), support 
vector machine (SVM) (with its four kernels), our previous DT classifier, and finally the proposed optimized 
DT. 


3. RESULTS AND DISCUSSION 

After extracting the features, all the calculated parameters were scaled and processed using ML 
classifiers. These classifiers depend on four parameters: 1) True positive (TP): Prediction is positive (Drowsy 
state is predicted) and X is Drowsy; ii) True negative (TN): Prediction is negative (Awake state is predicted) 
and X is Awake; iii) False positive (FP): Prediction is positive (Drowsy state is predicted) and X is Awake; 
and iv) False negative (FN): Prediction is negative (Awake state is predicted) and X is Drowsy. Based on 
these parameters we could calculate our different scoring outputs: 


TP 


Precision = —— (8) 
TP+FP 
TP+TN 
ACCUTAGY: => pea aaeneeen ) 
cts te TP 
Sensitivity (Recall) = (10) 


TP+FP 


Fl_score = 2+* ta *R ecall al 1) 
Precision + Recall 

Achieving a higher accuracy of a model depends on two studies, either we use a large segment of 
data to give the classifier a higher margin for the training and testing, or you try to build the analysis on solid 
features, therefore the first method is based on only PSD features, the second on only FFT features, the third 
method used only time features and the last one is our method based on the mixed features. As we can 
conclude, our hybrid model based on the mixed domains of features and our optimized DT classifier 
achieved the best accuracy compared to our previous work presented during an international conference 
(BML21: publishing on progress) and all the other selection of features and classifiers shown in Table 1. We 
used a personalized SearchGrid algorithm to select the best hyperparameters values of the DT classifier to 
achieve the best of accuracies as shown in Figure 2. A two-axes study was conducted to compare our method 
to the previous ones using the same dataset and the aspect of single-channel-based processing in order to 
situate our method. The results are shown in Table 2. Right after we generated a comparison in terms of the 
executing time and accuracies. In addition to the confusion matrix result of our optimized ML model shown 
in Figure 3. 

Comparing the results in Table 3, we conclude that the execution time is different from one 
classifier to another. But in terms of both time and accuracy, our optimized DT classifier is the most efficient 
and effective. The accuracy reached 96.4% and the execution time was within 53 milliseconds. 


Table 1. Performance comparison between different classifiers applied on our selected features 


Classifier First method Second method Third method Hybrid method 

Optimized DT 51.2% 94.7% 95.0% 96.4% 
DT (previous work) 49.3% 93.6% 94.3% 95.7% 
SVM (Linear kernel) 49.7% 49.9% 49.4% 49.5% 
SVM (Plynomial kernel) 54.6% 85.5% 93.2% 83.6% 
SVM (Sigmoid kernel) 35.3% 66.8% 88.7% 66.0% 
SVM (RBF kernel) 71.9% 86.5% 93.3% 87.8% 
MLP 49.8% T4.1% 48.9% 75.6% 
KNN 90.6% 92.9% 94.1% 93.1% 
GP 49.1% 86.9% 49.0% 56% 
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DTclassifier.get_params() 


{‘ccp_alpha’: @.@, 
“class_weight': None, 
‘criterion’: ‘entropy’, 
“max_depth': 32, 
“max_features': None, 
“max_leaf_nodes*: None, 
“min_impurity_decrease': 0.0, 
“min_impurity_split’': None, 
“min_samples_ leaf": 1, 
*“min_samples_ split’: 2, 
*min_weight_fraction_leaf*: 0.0, 
“presort': ‘deprecated’, 
“random_state’: None, 
“splitter': ‘best'} 


Figure 2. Search Grid output 


Table 2. Performance comparison between our proposed model and existing models using same Physionet 


EEG dataset 
Works Platform Sampling Size of Processing Classification method Accuracy 
used frequency segments method 

Proposed Python 100 Hz 3s Hybrid Optimized Decision Tree 96.4% 
Previous work Python 100 Hz 3s Hybrid Decision Tree 95.7% 
(B and Chinara, 2021) [21] MATLAB 100 Hz 5s WPT ET 94.45% 
(Bajaj et al., 2020) [20] - - - TQWT ELM 91.8% 
(Budak et al., 2019) [25] MATLAB 250 Hz 30s STFT, TQWT LSTM 94.31% 
(Belakhdar et al., 2018) [19] MATLAB 250 Hz 30s FFT ANN 88.8% 
(Ogino and Mtsukura, 2018) [26] __ Ipad app 512 Hz 10s PSD SVM, SWLDA 72.71% 


--- Execution time is : @.0536646842956543 seconds --- 
Train Accuracy : 100.0% 

Test Accuracy : 96.4% 

Test precision : 96.3% 

Classifier's Accuracy : 96.4% 

Recall : 96.7% 


precision recall fi-score support 

@ ®.96 @.97 8.96 1485 

1 8.97 6.96 8.96 1460 

accuracy 8.96 2865 
macro avg @.96 6.96 8.96 2865 
weighted avg @.96 6.96 8.96 2865 


(Recall: 96.7%, 
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Figure 3. Output of our optimized model (confusion matrix) 


Table 3. Time comparison between the different classifiers used in our method 


Classifier Accuracy Time(s) 
Proposed (Optimized DT) 96.4% 0.053 
Previous Work (DT) 95.7% 0.062 
SVM (Linear kernel) 87.8% 0.985 
Gaussian Process 56% 12.57 
Stochastic Gradient Descent 65.5% 0.366 
Multi-Layer Perceptron 75.6% 5.144 
Nearest Centroid 73.4% 0.006 


The final phase was to save our model (trained) and using it to predict the state of new subjects in 
order to approve our work and calculate the prediction time. The state of these subjects used for the approval 
was known already and tested by our new hybrid model. Effectively, the model could predict all the given 
data and gave perfect predictions. 
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4. CONCLUSION 

The present work proposed an optimized hybrid method of detecting drivers’ drowsiness based on 
time-frequency analysis of a FP! of EEG signals. We extracted a total of eight features from the three 
domains, the time, Fourier and PSD. After that, we trained eight ML models, MLP, GP, KNN, SVM (with its 
four kernels), DT and finally our optimized DT. We compared our proposed work to our previous one and to 
the ones based on the same dataset and the use of a single channel of EEG records. The added value of our 
model is the improvement of the detection’s performance in the term of accuracy, which achieved 96.4% and 
the processing time 0.053 seconds. 
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